Hierarchical Control and Learning for Markov Decision Processes Abstract Hierarchical Control and Learning for Markov Decision Processes

نویسنده

  • Ronald E. Parr
چکیده

This dissertation investigates the use of hierarchy and problem decomposition as a means of solving large, stochastic, sequential decision problems. These problems are framed as Markov decision problems (MDPs). The new technical content of this dissertation begins with a discussion of the concept of temporal abstraction. Temporal abstraction is shown to be equivalent to the transformation of a policy deened over a region of an MDP to an action in a semi-Markov decision problem (SMDP). Several algorithms are presented for performing this transformation eeciently. This dissertation introduces the HAM method for generating hierarchical, temporally abstract actions. This method permits the partial speciication of abstract actions in a way that corresponds to an abstract plan or strategy. Abstract actions speciied as HAMs can be optimally reened for new tasks by solving a reduced SMDP. The formal results show that traditional MDP algorithms can be used to optimally reene HAMs for new tasks. This can be achieved in much less time than it would take to learn a new policy for the task from scratch. HAMs complement some novel decomposition algorithms that are presented in this dissertation. These algorithms work by constructing a cache of policies for diierent regions of the MDP and then optimally combining the cached solution to produce a global solution that is within provable bounds of the optimal solution. Together, the methods developed in this dissertation provide important tools for 2 producing good policies for large MDPs. Unlike some ad-hoc methods, these methods provide strong formal guarantees. They use prior knowledge in a principled way, and they reduce larger MDPs into smaller ones while maintaining a well-deened relationship between the smaller problem and the larger problem.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Accelerated decomposition techniques for large discounted Markov decision processes

Many hierarchical techniques to solve large Markov decision processes (MDPs) are based on the partition of the state space into strongly connected components (SCCs) that can be classified into some levels. In each level, smaller problems named restricted MDPs are solved, and then these partial solutions are combined to obtain the global solution. In this paper, we first propose a novel algorith...

متن کامل

Utilizing Generalized Learning Automata for Finding Optimal Policies in MMDPs

Multi agent Markov decision processes (MMDPs), as the generalization of Markov decision processes to the multi agent case, have long been used for modeling multi agent system and are used as a suitable framework for Multi agent Reinforcement Learning. In this paper, a generalized learning automata based algorithm for finding optimal policies in MMDP is proposed. In the proposed algorithm, MMDP ...

متن کامل

Errata Preface Recent Advances in Hierarchical Reinforcement Learning

Decision Making, Guest Edited by Xi-Ren Cao. The Publisher offers an apology for printing an incorrect version of the paper in the special issue and renders this paper as the true and correct paper. Abstract. Reinforcement learning is bedeviled by the curse of dimensionality: the number of parameters to be learned grows exponentially with the size of any compact encoding of a state. Recent atte...

متن کامل

Inference strategies for solving semi-Markov decision processes

Semi-Markov decision processes are used to formulate many control problems and also play a key role in hierarchical reinforcement learning. In this chapter we show how to translate the decision making problem into a form that can instead be solved by inference and learning techniques. In particular, we will establish a formal connection between planning in semiMarkov decision processes and infe...

متن کامل

Tree Based Hierarchical Reinforcement Learning

In this thesis we investigate methods for speeding up automatic control algorithms. Specifically, we provide new abstraction techniques for Reinforcement Learning and Semi-Markov Decision Processes (SMDPs). We introduce the use of policies as temporally abstract actions. This is different from previous definitions of temporally abstract actions as we do not have termination criteria. We provide...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1998